PLCFRS Parsing Revisited: Restricting the Fan-Out to Two
نویسندگان
چکیده
Linear Context-Free Rewriting System (LCFRS) is an extension of Context-Free Grammar (CFG) in which a non-terminal can dominate more than a single continuous span of terminals. Probabilistic LCFRS have recently successfully been used for the direct data-driven parsing of discontinuous structures. In this paper we present a parser for binary PLCFRS of fan-out two, together with a novel monotonous estimate for A∗ parsing, with which we conduct experiments on modified versions of the German NeGra treebank and the Discontinuous Penn Treebank in which all trees have block degree two. The experiments show that compared to previous work, our approach provides an enormous speed-up while delivering an output of comparable richness.
منابع مشابه
PLCFRS Parsing of English Discontinuous Constituents
This paper proposes a direct parsing of non-local dependencies in English. To this end, we use probabilistic linear context-free rewriting systems for data-driven parsing, following recent work on parsing German. In order to do so, we first perform a transformation of the Penn Treebank annotation of non-local dependencies into an annotation using crossing branches. The resulting treebank can be...
متن کاملData-driven Parsing using PLCFRS Data-driven Parsing using Probabilistic Linear Context-Free Rewriting Systems
This paper presents the first efficient implementation of a weighted deductive CYK parser for Probabilistic Linear Context-Free Rewriting Systems (PLCFRS). LCFRS, an extension of CFG, can describe discontinuities in a straightforward way and is therefore a natural candidate to be used for data-driven parsing. To speed up parsing, we use different context-summary estimates of parse items, some o...
متن کاملOptimal Parsing Strategies for Linear Context-Free Rewriting Systems
Reduction is the operation of transforming a production in a Linear Context-Free Rewriting System (LCFRS) into two simpler productions by factoring out a subset of the nonterminals on the production’s righthand side. Reduction lowers the rank of a production but may increase its fan-out. We show how to apply reduction in order to minimize the parsing complexity of the resulting grammar, and stu...
متن کاملData-Driven Parsing with Probabilistic Linear Context-Free Rewriting Systems
This paper presents a first efficient implementation of a weighted deductive CYK parser for Probabilistic Linear ContextFree Rewriting Systems (PLCFRS), together with context-summary estimates for parse items used to speed up parsing. LCFRS, an extension of CFG, can describe discontinuities both in constituency and dependency structures in a straightforward way and is therefore a natural candid...
متن کاملOptimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
Linear Context-Free Rewriting Systems (LCFRSs) are a grammar formalism capable of modeling discontinuous phrases. Many parsing applications use LCFRSs where the fan-out (a measure of the discontinuity of phrases) does not exceed 2. We present an efficient algorithm for optimal reduction of the length of production right-hand side in LCFRSs with fan-out at most 2. This results in asymptotical ru...
متن کامل